{
 "cells": [
  {
   "cell_type": "code",
   "execution_count": 1,
   "metadata": {},
   "outputs": [
    {
     "name": "stderr",
     "output_type": "stream",
     "text": [
      "/usr/local/Cellar/python/3.6.5_1/Frameworks/Python.framework/Versions/3.6/lib/python3.6/importlib/_bootstrap.py:219: RuntimeWarning: numpy.dtype size changed, may indicate binary incompatibility. Expected 96, got 88\n",
      "  return f(*args, **kwds)\n",
      "/usr/local/Cellar/python/3.6.5_1/Frameworks/Python.framework/Versions/3.6/lib/python3.6/importlib/_bootstrap.py:219: RuntimeWarning: numpy.dtype size changed, may indicate binary incompatibility. Expected 96, got 88\n",
      "  return f(*args, **kwds)\n"
     ]
    },
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Current dir: /Users/fernando/Google Drive/Documentos/WORK/financial/adv_financial_ml_book/notebooks\n",
      "Current dir: /Users/fernando/Google Drive/Documentos/WORK/financial/adv_financial_ml_book\n"
     ]
    }
   ],
   "source": [
    "import numpy as np\n",
    "import pandas as pd \n",
    "\n",
    "import matplotlib as mpl\n",
    "import matplotlib.pyplot as plt\n",
    "import matplotlib.gridspec as gridspec\n",
    "%matplotlib inline\n",
    "import seaborn as sns\n",
    "plt.style.use('seaborn-talk')\n",
    "plt.style.use('bmh')\n",
    "plt.rcParams['font.weight'] = 'medium'\n",
    "#plt.rcParams['figure.figsize'] = 10,7\n",
    "blue, green, red, purple, gold, teal = sns.color_palette('colorblind', 6)\n",
    "\n",
    "import os\n",
    "print(f\"Current dir: {os.getcwd()}\")\n",
    "os.chdir('..')\n",
    "print(f\"Current dir: {os.getcwd()}\")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Chapter 3: Labeling\n",
    "___\n",
    "\n",
    "## Exercises\n",
    "**3.1** Form dollar bars for E-mini S&P 500 futures:\n",
    " - **(a)** Apply a symmetric CUSUM filter (Chapter 2, Section 2.5.2.1) where the threshold is the standard deviation of daily returns (Snippet 3.1).\n",
    " - **(b)** Use Snippet 3.4 on a pandas series t1, where numDays=1.\n",
    " - **(c)** On those sampled features, apply the triple-barrier method, where ptSl=[1,1] and t1 is the series you created in point 1.b.\n",
    " - **(d)** Apply getBins to generate the labels.\n",
    "\n",
    "**3.2** From exercise 1, use Snippet 3.8 to drop rare labels.\n",
    "\n",
    "**3.3** Adjust the getBins function (Snippet 3.5) to return a 0 whenever the vertical barrier is the one touched first.\n",
    "\n",
    "**3.4** Develop a trend-following strategy based on a popular technical analysis statistic (e.g., crossing moving averages). For each observation, the model suggests a side, but not a size of the bet.\n",
    " - **(a)** Derive meta-labels for ptSl=[1,2] and t1 where numDays=1. Use as trgt the daily standard deviation as computed by Snippet 3.1.\n",
    " - **(b)** Train a random forest to decide whether to trade or not. Note: The decision is whether to trade or not, {0,1}, since the underlying model (the crossing moving average) has decided the side, {−1,1}.\n",
    "\n",
    "**3.5** Develop a mean-reverting strategy based on Bollinger bands. For each observation, the model suggests a side, but not a size of the bet.\n",
    " - **(a)** Derive meta-labels for ptSl=[0,2] and t1 where numDays=1. Use as trgt the daily standard deviation as computed by Snippet 3.1.\n",
    " - **(b)** Train a random forest to decide whether to trade or not. Use as features: volatility, serial correlation, and the crossing moving averages from exercise 2.\n",
    " - **(c)** What is the accuracy of predictions from the primary model (i.e., if the secondary model does not filter the bets)? What are the precision, recall, and F1-scores?\n",
    " - **(d)** What is the accuracy of predictions from the secondary model? What are the precision, recall, and F1-scores?\n",
    "\n",
    "___"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Data loading"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 7,
   "metadata": {},
   "outputs": [],
   "source": [
    "clean_data_path = 'data/processed/clean_IVE_tickbidask.parq'\n",
    "df = pd.read_parquet(clean_data_path)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "___\n",
    "## Solutions\n",
    "\n",
    "**3.1** Form dollar bars for E-mini S&P 500 futures:\n",
    "- **(a)** Apply a symmetric CUSUM filter (Chapter 2, Section 2.5.2.1) where the threshold is the standard deviation of daily returns (Snippet 3.1).\n",
    "- **(b)** Use Snippet 3.4 on a pandas series t1, where numDays=1.\n",
    "- **(c)** On those sampled features, apply the triple-barrier method, where ptSl=[1,1] and t1 is the series you created in point 1.b.\n",
    "- **(d)** Apply getBins to generate the labels."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "## form dolar bars \n",
    "from src.features.bars_computation import dollar_bar_df\n",
    "dbars = dollar_bar_df(df, 'dv', 1_000_000).drop_duplicates().dropna()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "price_one_day = df.price['2009-09-28']\n",
    "dolar_bar_one_day = dbars.price['2009-09-28']\n",
    "\n",
    "plt.figure(figsize=(20,10))\n",
    "plt.plot(price_one_day,marker=\"*\")\n",
    "plt.plot(dolar_bar_one_day, 's', marker=\"^\")"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 27,
   "metadata": {},
   "outputs": [],
   "source": [
    "from src.snippets.ch3 import getDailyVol"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 34,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>price</th>\n",
       "      <th>bid</th>\n",
       "      <th>ask</th>\n",
       "      <th>size</th>\n",
       "      <th>v</th>\n",
       "      <th>dv</th>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>dates</th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>2009-09-28 09:46:35</th>\n",
       "      <td>51.07</td>\n",
       "      <td>51.05</td>\n",
       "      <td>51.07</td>\n",
       "      <td>900</td>\n",
       "      <td>900</td>\n",
       "      <td>45963.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2009-09-28 09:53:49</th>\n",
       "      <td>51.14</td>\n",
       "      <td>51.13</td>\n",
       "      <td>51.14</td>\n",
       "      <td>2000</td>\n",
       "      <td>2000</td>\n",
       "      <td>102280.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2009-09-28 09:55:26</th>\n",
       "      <td>51.14</td>\n",
       "      <td>51.11</td>\n",
       "      <td>51.14</td>\n",
       "      <td>100</td>\n",
       "      <td>100</td>\n",
       "      <td>5114.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2009-09-28 10:02:52</th>\n",
       "      <td>51.25</td>\n",
       "      <td>51.24</td>\n",
       "      <td>51.26</td>\n",
       "      <td>4300</td>\n",
       "      <td>4300</td>\n",
       "      <td>220375.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2009-09-28 10:10:21</th>\n",
       "      <td>51.29</td>\n",
       "      <td>51.28</td>\n",
       "      <td>51.29</td>\n",
       "      <td>4500</td>\n",
       "      <td>4500</td>\n",
       "      <td>230805.0</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "                     price    bid    ask  size     v        dv\n",
       "dates                                                         \n",
       "2009-09-28 09:46:35  51.07  51.05  51.07   900   900   45963.0\n",
       "2009-09-28 09:53:49  51.14  51.13  51.14  2000  2000  102280.0\n",
       "2009-09-28 09:55:26  51.14  51.11  51.14   100   100    5114.0\n",
       "2009-09-28 10:02:52  51.25  51.24  51.26  4300  4300  220375.0\n",
       "2009-09-28 10:10:21  51.29  51.28  51.29  4500  4500  230805.0"
      ]
     },
     "execution_count": 34,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "dbars.head()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 82,
   "metadata": {},
   "outputs": [],
   "source": [
    "close = dbars.price.copy()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 90,
   "metadata": {},
   "outputs": [],
   "source": [
    "close = close.drop_duplicates()\n",
    "close = close[~close.index.duplicated(keep='first')]"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": []
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "**3.2** From exercise 1, use Snippet 3.8 to drop rare labels."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": []
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "**3.3** Adjust the getBins function (Snippet 3.5) to return a 0 whenever the vertical barrier is the one touched first."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": []
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "**3.4** Develop a trend-following strategy based on a popular technical analysis statistic (e.g., crossing moving averages). For each observation, the model suggests a side, but not a size of the bet.\n",
    "- **(a)** Derive meta-labels for ptSl=[1,2] and t1 where numDays=1. Use as trgt the daily standard deviation as computed by Snippet 3.1.\n",
    "- **(b)** Train a random forest to decide whether to trade or not. Note: The decision is whether to trade or not, {0,1}, since the underlying model (the crossing moving average) has decided the side, {−1,1}."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": []
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "**3.5** Develop a mean-reverting strategy based on Bollinger bands. For each observation, the model suggests a side, but not a size of the bet.\n",
    "- **(a)** Derive meta-labels for ptSl=[0,2] and t1 where numDays=1. Use as trgt the daily standard deviation as computed by Snippet 3.1.\n",
    "- **(b)** Train a random forest to decide whether to trade or not. Use as features: volatility, serial correlation, and the crossing moving averages from exercise 2.\n",
    "- **(c)** What is the accuracy of predictions from the primary model (i.e., if the secondary model does not filter the bets)? What are the precision, recall, and F1-scores?\n",
    "- **(d)** What is the accuracy of predictions from the secondary model? What are the precision, recall, and F1-scores?"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 9,
   "metadata": {},
   "outputs": [],
   "source": []
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.7.3"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 2
}
